{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quick Get Started Notebook of Intel® Neural Compressor for Pytorch\n",
    "\n",
    "\n",
    "This notebook is designed to provide an easy-to-follow guide for getting started with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) (INC) library for [pytorch](https://github.com/pytorch/pytorch) framework.\n",
    "\n",
    "In the following sections, we will use a BERT model as an example, referencing the [`run_glue_no_trainer.py` script](https://github.com/huggingface/transformers/blob/v4.53.1/examples/pytorch/text-classification/run_glue_no_trainer.py), to demonstrate how to apply post-training quantization to Hugging Face Transformers models using the Intel Neural Compressor (INC) library.\n",
    "\n",
    "\n",
    "The main objectives of this notebook are:\n",
    "\n",
    "1. Prerequisite: Prepare necessary environment, model and dataset.\n",
    "2. Quantization with INC: Walk through the step-by-step process of applying post-training static quantization.\n",
    "\n",
    "\n",
    "## 1. Prerequisite\n",
    "\n",
    "### 1.1 Environment\n",
    "\n",
    "If you have Jupyter Notebook, you may directly run this notebook. We will use pip to install or upgrade [neural-compressor](https://github.com/intel/neural-compressor), [pytorch](https://github.com/pytorch/pytorch) and other required packages.\n",
    "\n",
    "Otherwise, you can setup a new environment. First, we install [Anaconda](https://www.anaconda.com/distribution/). Then open an Anaconda prompt window and run the following commands:\n",
    "\n",
    "```shell\n",
    "conda create -n inc_notebook python==3.10\n",
    "conda activate inc_notebook\n",
    "pip install jupyter\n",
    "jupyter notebook\n",
    "```\n",
    "The last command will launch Jupyter Notebook and we can open this notebook in browser to continue.\n",
    "\n",
    "Then, let's install necessary packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install neural-compressor from source\n",
    "import sys\n",
    "!git clone https://github.com/intel/neural-compressor.git\n",
    "%cd ./neural-compressor\n",
    "!{sys.executable} -m pip install -r requirements.txt\n",
    "!{sys.executable} setup.py install\n",
    "%cd ..\n",
    "\n",
    "# or install stable basic version from pypi\n",
    "!{sys.executable} -m pip install neural-compressor\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install other packages used in this notebook.\n",
    "!{sys.executable} -m pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Load Dataset\n",
    "\n",
    "The General Language Understanding Evaluation (GLUE) benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:\n",
    "\n",
    "- [CoLA](https://nyu-mll.github.io/CoLA/) (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.\n",
    "- [MNLI](https://arxiv.org/abs/1704.05426) (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.\n",
    "- [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.\n",
    "- [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. This dataset is built from the SQuAD dataset.\n",
    "- [QQP](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.\n",
    "- [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.\n",
    "- [SST-2](https://nlp.stanford.edu/sentiment/index.html) (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.\n",
    "- [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.\n",
    "- [WNLI](https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. This dataset is built from the Winograd Schema Challenge dataset.\n",
    "\n",
    "Here, we use MRPC task. We download and load the required dataset from hub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import evaluate\n",
    "import torch\n",
    "from datasets import load_dataset\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "from transformers import (\n",
    "    AutoConfig,\n",
    "    AutoModelForSequenceClassification,\n",
    "    AutoTokenizer,\n",
    "    default_data_collator,\n",
    ")\n",
    "from transformers.utils import check_min_version\n",
    "# Will error if the minimal version of Transformers is not installed. Remove at your own risks.\n",
    "check_min_version(\"4.53.1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "task_name = 'mrpc'\n",
    "raw_datasets = load_dataset(\"nyu-mll/glue\", task_name)\n",
    "label_list = raw_datasets[\"train\"].features[\"label\"].names\n",
    "num_labels = len(label_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Prepare Model\n",
    "Download the pretrained model [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased) to a pytorch model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at google-bert/bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "BertForSequenceClassification(\n",
       "  (bert): BertModel(\n",
       "    (embeddings): BertEmbeddings(\n",
       "      (word_embeddings): Embedding(28996, 768, padding_idx=0)\n",
       "      (position_embeddings): Embedding(512, 768)\n",
       "      (token_type_embeddings): Embedding(2, 768)\n",
       "      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "      (dropout): Dropout(p=0.1, inplace=False)\n",
       "    )\n",
       "    (encoder): BertEncoder(\n",
       "      (layer): ModuleList(\n",
       "        (0-11): 12 x BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSdpaSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "            (intermediate_act_fn): GELUActivation()\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "      )\n",
       "    )\n",
       "    (pooler): BertPooler(\n",
       "      (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "      (activation): Tanh()\n",
       "    )\n",
       "  )\n",
       "  (dropout): Dropout(p=0.1, inplace=False)\n",
       "  (classifier): Linear(in_features=768, out_features=2, bias=True)\n",
       ")"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_name = 'google-bert/bert-base-cased'\n",
    "\n",
    "config = AutoConfig.from_pretrained(\n",
    "    model_name,\n",
    "    num_labels=num_labels,\n",
    "    finetuning_task=task_name,\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "tokenizer = AutoTokenizer.from_pretrained(\n",
    "    model_name,\n",
    "    use_fast = True,\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "model = AutoModelForSequenceClassification.from_pretrained(\n",
    "    model_name,\n",
    "    from_tf=False,\n",
    "    config=config,\n",
    "    ignore_mismatched_sizes=False,\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Dataset Preprocessing\n",
    "We need to preprocess the raw dataset and make dataloaders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Running tokenizer on dataset: 100%|█████████████████████████████████████████████████████████████████████████| 3668/3668 [00:00<00:00, 15181.10 examples/s]\n",
      "Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████| 408/408 [00:00<00:00, 13910.21 examples/s]\n",
      "Running tokenizer on dataset: 100%|██████████████████████████████████████████████████████████████████████████| 1725/1725 [00:00<00:00, 7403.78 examples/s]\n"
     ]
    }
   ],
   "source": [
    "sentence1_key, sentence2_key = (\"sentence1\", \"sentence2\")\n",
    "padding = \"max_length\"\n",
    "max_seq_length = 128\n",
    "\n",
    "def preprocess_function(examples):\n",
    "    args = (\n",
    "        (examples[sentence1_key], examples[sentence2_key])\n",
    "    )\n",
    "    result = tokenizer(*args, padding=padding, max_length=max_seq_length, truncation=True)\n",
    "    if \"label\" in examples:\n",
    "        result[\"labels\"] = examples[\"label\"]\n",
    "    return result\n",
    "\n",
    "processed_datasets = raw_datasets.map(\n",
    "    preprocess_function,\n",
    "    batched=True,\n",
    "    remove_columns=raw_datasets[\"train\"].column_names,\n",
    "    desc=\"Running tokenizer on dataset\",\n",
    ")\n",
    " \n",
    "train_dataset = processed_datasets[\"train\"]\n",
    "eval_dataset = processed_datasets[\"validation\"]\n",
    "\n",
    "\n",
    "data_collator = default_data_collator\n",
    "\n",
    "train_dataloader = DataLoader(\n",
    "    train_dataset, shuffle=True, collate_fn=data_collator, batch_size=8\n",
    ")\n",
    "example_inputs = next(iter(train_dataloader))\n",
    "eval_dataloader = DataLoader(eval_dataset, collate_fn=data_collator, batch_size=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Quantization with Intel® Neural Compressor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Define calibration function and evaluate function\n",
    "\n",
    "In this part, we define a GLUE metric and use it to generate an evaluate function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define calibration function\n",
    "def run_fn(model):\n",
    "    for step, batch in enumerate(train_dataloader):\n",
    "        outputs = model(**batch)\n",
    "\n",
    "# define evaluation function\n",
    "metric = evaluate.load(\"glue\", task_name)\n",
    "def eval_fn(model):\n",
    "    for step, batch in enumerate(eval_dataloader):\n",
    "        with torch.no_grad():\n",
    "            outputs = model(**batch)\n",
    "        try:\n",
    "            predictions = outputs.logits.argmax(dim=-1)\n",
    "        except (AttributeError, KeyError):\n",
    "            predictions = outputs[\"logits\"].argmax(dim=-1)\n",
    "        references = batch[\"labels\"]\n",
    "        metric.add_batch(\n",
    "            predictions=predictions,\n",
    "            references=references,\n",
    "        )\n",
    "\n",
    "    eval_metric = metric.compute()\n",
    "    print(f\"evaluate results: {eval_metric}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Run Quantization\n",
    "\n",
    "So far, we can finally start to quantize the model. \n",
    "\n",
    "To start, we need to set the configuration for post-training quantization using `get_default_static_config()` to get static quant config. Once the configuration is set, we can proceed to the next step by calling the `prepare`, `convert` function. This function performs the quantization process on the model and will return the quantized model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-07-07 03:07:50 [WARNING][auto_accelerator.py:454] Auto detect accelerator: CPU_Accelerator.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluate results: {'accuracy': 0.6813725490196079, 'f1': 0.8099415204678363}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[W707 03:07:56.893189687 OperatorEntry.cpp:154] Warning: Warning only once for all operators,  other operators may also be overridden.\n",
      "  Overriding a previously registered kernel for the same operator and the same dispatch key\n",
      "  operator: aten::_addmm_activation(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1, bool use_gelu=False) -> Tensor\n",
      "    registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6\n",
      "  dispatch key: AutocastCPU\n",
      "  previous kernel: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:327\n",
      "       new kernel: registered at /opt/workspace/ipex-cpu-dev/csrc/cpu/autocast/autocast_mode.cpp:112 (function operator())\n",
      "2025-07-07 03:07:56 [INFO][2204578354.py:12] Preparation started.\n",
      "2025-07-07 03:07:56 [WARNING][auto_accelerator.py:454] Auto detect accelerator: CPU_Accelerator.\n",
      "2025-07-07 03:07:56 [INFO][utility.py:740]  Found 12 blocks\n",
      "2025-07-07 03:07:56 [INFO][utility.py:342] Attention Blocks: 12\n",
      "2025-07-07 03:07:56 [INFO][utility.py:343] FFN Blocks: 12\n",
      "2025-07-07 03:07:56,882 - _logger.py - IPEX - WARNING - [NotSupported]BatchNorm folding failed during the prepare process.\n",
      "2025-07-07 03:07:57 [INFO][utility.py:441] Attention Blocks : \n",
      "2025-07-07 03:07:57 [INFO][utility.py:442] [['bert.encoder.layer.0.attention.self.query', 'bert.encoder.layer.0.attention.self.key', 'bert.encoder.layer.0.attention.self.value', 'bert.encoder.layer.0.attention.output.dense'], ['bert.encoder.layer.1.attention.self.query', 'bert.encoder.layer.1.attention.self.key', 'bert.encoder.layer.1.attention.self.value', 'bert.encoder.layer.1.attention.output.dense'], ['bert.encoder.layer.2.attention.self.query', 'bert.encoder.layer.2.attention.self.key', 'bert.encoder.layer.2.attention.self.value', 'bert.encoder.layer.2.attention.output.dense'], ['bert.encoder.layer.3.attention.self.query', 'bert.encoder.layer.3.attention.self.key', 'bert.encoder.layer.3.attention.self.value', 'bert.encoder.layer.3.attention.output.dense'], ['bert.encoder.layer.4.attention.self.query', 'bert.encoder.layer.4.attention.self.key', 'bert.encoder.layer.4.attention.self.value', 'bert.encoder.layer.4.attention.output.dense'], ['bert.encoder.layer.5.attention.self.query', 'bert.encoder.layer.5.attention.self.key', 'bert.encoder.layer.5.attention.self.value', 'bert.encoder.layer.5.attention.output.dense'], ['bert.encoder.layer.6.attention.self.query', 'bert.encoder.layer.6.attention.self.key', 'bert.encoder.layer.6.attention.self.value', 'bert.encoder.layer.6.attention.output.dense'], ['bert.encoder.layer.7.attention.self.query', 'bert.encoder.layer.7.attention.self.key', 'bert.encoder.layer.7.attention.self.value', 'bert.encoder.layer.7.attention.output.dense'], ['bert.encoder.layer.8.attention.self.query', 'bert.encoder.layer.8.attention.self.key', 'bert.encoder.layer.8.attention.self.value', 'bert.encoder.layer.8.attention.output.dense'], ['bert.encoder.layer.9.attention.self.query', 'bert.encoder.layer.9.attention.self.key', 'bert.encoder.layer.9.attention.self.value', 'bert.encoder.layer.9.attention.output.dense'], ['bert.encoder.layer.10.attention.self.query', 'bert.encoder.layer.10.attention.self.key', 'bert.encoder.layer.10.attention.self.value', 'bert.encoder.layer.10.attention.output.dense'], ['bert.encoder.layer.11.attention.self.query', 'bert.encoder.layer.11.attention.self.key', 'bert.encoder.layer.11.attention.self.value', 'bert.encoder.layer.11.attention.output.dense']]\n",
      "2025-07-07 03:07:57 [INFO][utility.py:443] FFN Blocks : \n",
      "2025-07-07 03:07:57 [INFO][utility.py:444] [['bert.encoder.layer.0.intermediate.dense', 'bert.encoder.layer.0.output.dense'], ['bert.encoder.layer.1.intermediate.dense', 'bert.encoder.layer.1.output.dense'], ['bert.encoder.layer.2.intermediate.dense', 'bert.encoder.layer.2.output.dense'], ['bert.encoder.layer.3.intermediate.dense', 'bert.encoder.layer.3.output.dense'], ['bert.encoder.layer.4.intermediate.dense', 'bert.encoder.layer.4.output.dense'], ['bert.encoder.layer.5.intermediate.dense', 'bert.encoder.layer.5.output.dense'], ['bert.encoder.layer.6.intermediate.dense', 'bert.encoder.layer.6.output.dense'], ['bert.encoder.layer.7.intermediate.dense', 'bert.encoder.layer.7.output.dense'], ['bert.encoder.layer.8.intermediate.dense', 'bert.encoder.layer.8.output.dense'], ['bert.encoder.layer.9.intermediate.dense', 'bert.encoder.layer.9.output.dense'], ['bert.encoder.layer.10.intermediate.dense', 'bert.encoder.layer.10.output.dense'], ['bert.encoder.layer.11.intermediate.dense', 'bert.encoder.layer.11.output.dense']]\n",
      "2025-07-07 03:07:57 [INFO][quantize.py:173] Start to prepare model with static_quant.\n",
      "2025-07-07 03:07:57 [INFO][algorithm_entry.py:209] Quantize model with the static quant algorithm.\n",
      "2025-07-07 03:07:57 [INFO][utility.py:740]  Found 12 blocks\n",
      "2025-07-07 03:07:57 [INFO][utility.py:342] Attention Blocks: 12\n",
      "2025-07-07 03:07:57 [INFO][utility.py:343] FFN Blocks: 12\n",
      "2025-07-07 03:07:57 [INFO][utility.py:441] Attention Blocks : \n",
      "2025-07-07 03:07:57 [INFO][utility.py:442] [['bert.encoder.layer.0.attention.self.query', 'bert.encoder.layer.0.attention.self.key', 'bert.encoder.layer.0.attention.self.value', 'bert.encoder.layer.0.attention.output.dense'], ['bert.encoder.layer.1.attention.self.query', 'bert.encoder.layer.1.attention.self.key', 'bert.encoder.layer.1.attention.self.value', 'bert.encoder.layer.1.attention.output.dense'], ['bert.encoder.layer.2.attention.self.query', 'bert.encoder.layer.2.attention.self.key', 'bert.encoder.layer.2.attention.self.value', 'bert.encoder.layer.2.attention.output.dense'], ['bert.encoder.layer.3.attention.self.query', 'bert.encoder.layer.3.attention.self.key', 'bert.encoder.layer.3.attention.self.value', 'bert.encoder.layer.3.attention.output.dense'], ['bert.encoder.layer.4.attention.self.query', 'bert.encoder.layer.4.attention.self.key', 'bert.encoder.layer.4.attention.self.value', 'bert.encoder.layer.4.attention.output.dense'], ['bert.encoder.layer.5.attention.self.query', 'bert.encoder.layer.5.attention.self.key', 'bert.encoder.layer.5.attention.self.value', 'bert.encoder.layer.5.attention.output.dense'], ['bert.encoder.layer.6.attention.self.query', 'bert.encoder.layer.6.attention.self.key', 'bert.encoder.layer.6.attention.self.value', 'bert.encoder.layer.6.attention.output.dense'], ['bert.encoder.layer.7.attention.self.query', 'bert.encoder.layer.7.attention.self.key', 'bert.encoder.layer.7.attention.self.value', 'bert.encoder.layer.7.attention.output.dense'], ['bert.encoder.layer.8.attention.self.query', 'bert.encoder.layer.8.attention.self.key', 'bert.encoder.layer.8.attention.self.value', 'bert.encoder.layer.8.attention.output.dense'], ['bert.encoder.layer.9.attention.self.query', 'bert.encoder.layer.9.attention.self.key', 'bert.encoder.layer.9.attention.self.value', 'bert.encoder.layer.9.attention.output.dense'], ['bert.encoder.layer.10.attention.self.query', 'bert.encoder.layer.10.attention.self.key', 'bert.encoder.layer.10.attention.self.value', 'bert.encoder.layer.10.attention.output.dense'], ['bert.encoder.layer.11.attention.self.query', 'bert.encoder.layer.11.attention.self.key', 'bert.encoder.layer.11.attention.self.value', 'bert.encoder.layer.11.attention.output.dense']]\n",
      "2025-07-07 03:07:57 [INFO][utility.py:443] FFN Blocks : \n",
      "2025-07-07 03:07:57 [INFO][utility.py:444] [['bert.encoder.layer.0.intermediate.dense', 'bert.encoder.layer.0.output.dense'], ['bert.encoder.layer.1.intermediate.dense', 'bert.encoder.layer.1.output.dense'], ['bert.encoder.layer.2.intermediate.dense', 'bert.encoder.layer.2.output.dense'], ['bert.encoder.layer.3.intermediate.dense', 'bert.encoder.layer.3.output.dense'], ['bert.encoder.layer.4.intermediate.dense', 'bert.encoder.layer.4.output.dense'], ['bert.encoder.layer.5.intermediate.dense', 'bert.encoder.layer.5.output.dense'], ['bert.encoder.layer.6.intermediate.dense', 'bert.encoder.layer.6.output.dense'], ['bert.encoder.layer.7.intermediate.dense', 'bert.encoder.layer.7.output.dense'], ['bert.encoder.layer.8.intermediate.dense', 'bert.encoder.layer.8.output.dense'], ['bert.encoder.layer.9.intermediate.dense', 'bert.encoder.layer.9.output.dense'], ['bert.encoder.layer.10.intermediate.dense', 'bert.encoder.layer.10.output.dense'], ['bert.encoder.layer.11.intermediate.dense', 'bert.encoder.layer.11.output.dense']]\n",
      "2025-07-07 03:07:58 [INFO][2204578354.py:12] Preparation end.\n",
      "2025-07-07 03:08:45 [INFO][2204578354.py:14] Conversion started.\n",
      "2025-07-07 03:08:45 [INFO][utility.py:740]  Found 12 blocks\n",
      "2025-07-07 03:08:45 [INFO][utility.py:342] Attention Blocks: 12\n",
      "2025-07-07 03:08:45 [INFO][utility.py:343] FFN Blocks: 12\n",
      "2025-07-07 03:08:45 [INFO][utility.py:441] Attention Blocks : \n",
      "2025-07-07 03:08:45 [INFO][utility.py:442] [['bert.encoder.layer.0.attention.self.query', 'bert.encoder.layer.0.attention.self.key', 'bert.encoder.layer.0.attention.self.value', 'bert.encoder.layer.0.attention.output.dense'], ['bert.encoder.layer.1.attention.self.query', 'bert.encoder.layer.1.attention.self.key', 'bert.encoder.layer.1.attention.self.value', 'bert.encoder.layer.1.attention.output.dense'], ['bert.encoder.layer.2.attention.self.query', 'bert.encoder.layer.2.attention.self.key', 'bert.encoder.layer.2.attention.self.value', 'bert.encoder.layer.2.attention.output.dense'], ['bert.encoder.layer.3.attention.self.query', 'bert.encoder.layer.3.attention.self.key', 'bert.encoder.layer.3.attention.self.value', 'bert.encoder.layer.3.attention.output.dense'], ['bert.encoder.layer.4.attention.self.query', 'bert.encoder.layer.4.attention.self.key', 'bert.encoder.layer.4.attention.self.value', 'bert.encoder.layer.4.attention.output.dense'], ['bert.encoder.layer.5.attention.self.query', 'bert.encoder.layer.5.attention.self.key', 'bert.encoder.layer.5.attention.self.value', 'bert.encoder.layer.5.attention.output.dense'], ['bert.encoder.layer.6.attention.self.query', 'bert.encoder.layer.6.attention.self.key', 'bert.encoder.layer.6.attention.self.value', 'bert.encoder.layer.6.attention.output.dense'], ['bert.encoder.layer.7.attention.self.query', 'bert.encoder.layer.7.attention.self.key', 'bert.encoder.layer.7.attention.self.value', 'bert.encoder.layer.7.attention.output.dense'], ['bert.encoder.layer.8.attention.self.query', 'bert.encoder.layer.8.attention.self.key', 'bert.encoder.layer.8.attention.self.value', 'bert.encoder.layer.8.attention.output.dense'], ['bert.encoder.layer.9.attention.self.query', 'bert.encoder.layer.9.attention.self.key', 'bert.encoder.layer.9.attention.self.value', 'bert.encoder.layer.9.attention.output.dense'], ['bert.encoder.layer.10.attention.self.query', 'bert.encoder.layer.10.attention.self.key', 'bert.encoder.layer.10.attention.self.value', 'bert.encoder.layer.10.attention.output.dense'], ['bert.encoder.layer.11.attention.self.query', 'bert.encoder.layer.11.attention.self.key', 'bert.encoder.layer.11.attention.self.value', 'bert.encoder.layer.11.attention.output.dense']]\n",
      "2025-07-07 03:08:45 [INFO][utility.py:443] FFN Blocks : \n",
      "2025-07-07 03:08:45 [INFO][utility.py:444] [['bert.encoder.layer.0.intermediate.dense', 'bert.encoder.layer.0.output.dense'], ['bert.encoder.layer.1.intermediate.dense', 'bert.encoder.layer.1.output.dense'], ['bert.encoder.layer.2.intermediate.dense', 'bert.encoder.layer.2.output.dense'], ['bert.encoder.layer.3.intermediate.dense', 'bert.encoder.layer.3.output.dense'], ['bert.encoder.layer.4.intermediate.dense', 'bert.encoder.layer.4.output.dense'], ['bert.encoder.layer.5.intermediate.dense', 'bert.encoder.layer.5.output.dense'], ['bert.encoder.layer.6.intermediate.dense', 'bert.encoder.layer.6.output.dense'], ['bert.encoder.layer.7.intermediate.dense', 'bert.encoder.layer.7.output.dense'], ['bert.encoder.layer.8.intermediate.dense', 'bert.encoder.layer.8.output.dense'], ['bert.encoder.layer.9.intermediate.dense', 'bert.encoder.layer.9.output.dense'], ['bert.encoder.layer.10.intermediate.dense', 'bert.encoder.layer.10.output.dense'], ['bert.encoder.layer.11.intermediate.dense', 'bert.encoder.layer.11.output.dense']]\n",
      "2025-07-07 03:08:45 [INFO][quantize.py:242] Start to convert model with static_quant.\n",
      "2025-07-07 03:08:45 [INFO][algorithm_entry.py:209] Quantize model with the static quant algorithm.\n",
      "/home/sdp/miniforge3/envs/changwa1/lib/python3.10/site-packages/neural_compressor/torch/algorithms/static_quant/static_quant.py:192: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
      "  with torch.cpu.amp.autocast():\n",
      "`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.\n",
      "/home/sdp/miniforge3/envs/changwa1/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:196: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.\n",
      "  inverted_mask = torch.tensor(1.0, dtype=dtype) - expanded_mask\n",
      "/home/sdp/miniforge3/envs/changwa1/lib/python3.10/site-packages/intel_extension_for_pytorch/quantization/_quantization_state_utils.py:446: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
      "  args, scale.item(), zp.item(), dtype\n",
      "/home/sdp/miniforge3/envs/changwa1/lib/python3.10/site-packages/intel_extension_for_pytorch/quantization/_quantization_state.py:480: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
      "  if scale.numel() > 1:\n",
      "2025-07-07 03:08:50 [INFO][utility.py:406] |******Mixed Precision Statistics******|\n",
      "2025-07-07 03:08:50 [INFO][utility.py:408] +---------------+-----------+----------+\n",
      "2025-07-07 03:08:50 [INFO][utility.py:408] |    Op Type    |   Total   |   INT8   |\n",
      "2025-07-07 03:08:50 [INFO][utility.py:408] +---------------+-----------+----------+\n",
      "2025-07-07 03:08:50 [INFO][utility.py:408] |     Linear    |     26    |    26    |\n",
      "2025-07-07 03:08:50 [INFO][utility.py:408] +---------------+-----------+----------+\n",
      "2025-07-07 03:08:50 [INFO][static_quant.py:172] Static quantization done.\n",
      "2025-07-07 03:08:50 [INFO][2204578354.py:14] Conversion end.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "evaluate results: {'accuracy': 0.6838235294117647, 'f1': 0.8116788321167884}\n"
     ]
    }
   ],
   "source": [
    "from neural_compressor.torch.quantization import (\n",
    "    convert,\n",
    "    get_default_static_config,\n",
    "    prepare,\n",
    ")\n",
    "\n",
    "# fp32 results\n",
    "eval_fn(model)\n",
    "# ipex static quant\n",
    "import intel_extension_for_pytorch\n",
    "quant_config = get_default_static_config()\n",
    "prepared_model = prepare(model, quant_config=quant_config, example_inputs=example_inputs)\n",
    "run_fn(prepared_model)\n",
    "q_model = convert(prepared_model)\n",
    "eval_fn(q_model)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
